A z-test is a statistical test used to determine whether there is a significant difference between a sample statistic and a population parameter, under the assumption that the population variance is known.
Z-test: Foundation in Standardization
population variance is known: The Z-test is used for hypothesis testing when the population variance is known and the sample size is large (usually n > 30). These conditions allow for a simpler introduction to hypothesis testing because the standard normal distribution (Z-distribution) is used.
Standard Normal Distribution: The Z-test introduces the concept of the standard normal distribution, a critical foundational concept in statistics. Understanding how to standardize a score using the Z-distribution is fundamental and applies to many areas in statistics.
Large Sample Assumption: The Z-test is applicable in scenarios where the sample size is large enough to approximate the sampling distribution of the mean as normally distributed, due to the Central Limit Theorem. This concept is easier for beginners to grasp before moving on to situations where the sample size is small, and the population variance is unknown.
Z-test Example problem
Let’s consider a research question related to the average weight of adults in a particular city. Suppose previous studies indicate that the average weight of adults in this city is 75 kgs. We want to test if a new sample of adults from a specific neighborhood has a significantly different average weight, suggesting the neighborhood might have factors influencing weight
Is the average weight of adults in this specific neighborhood different from the general city average of 75 kg?
given that the population standard deviation is known to be 10 kg.
Hypothesis
- Null Hypothesis (\(H_0\)): The average weight of adults in the neighborhood is 75 kgs (\(\mu = 75 kgs\)).
- Alternative Hypothesis (\(H_a\)): The average weight of adults in the neighborhood is not 75 kgs (\(\mu \neq 75 kgs\)).
Data Collection
Suppose we collect a sample of 31 adults from the neighborhood and measure their weights. The sample provides the following statistics:
- Sample mean weight (\(\bar{x}\)) = 75 kgs
- Population standard deviation (\(\sigma\)) is known to be 10 (from previous extensive studies)
- Sample size (\(n\)) = 31
Given the sample values
72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72
Let’s calculate the sample mean, Z-statistic, p-value for the provided data set.
Calculate the Sample Mean (\(\bar{x}\)):
let’s first determine the sample size \(n\) and then calculate the sample mean \(\bar{x}\):
\[
\bar{x} = \frac{\text{Sum of all sample values}}{n}
\]
\[\bar{x} = \frac{72 + 75 + 71 + 74 + 78 + 79 + 72 + 73 + 76 + 77 + 70 + 79 + 71 + 74 + 78}{15}\]
The sample mean (\(\bar{x}\)) is 74.52 kg
Calculate Z-Statistic:
Given:
-
\(\mu = 75\) kg (population mean),
-
\(\sigma = 10\) kg (population standard deviation),
-
\(n=31\) (sample size).
We will use the Z-test formula: \[
Z = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}}
\]
\[
Z = \frac{74.52 - 75}{10 / \sqrt{31}} \approx -0.269
\]
The calculated Z-statistic is approximately -0.269.
- The Z-statistic tells us how many standard deviations the sample mean is from the population mean.
- A Z-statistic of -0.269 indicates that the sample mean is only about 0.269 standard deviations below the population mean.
Calculate P-Value
To determine if this difference is statistically significant, we would compare the absolute value of the Z-statistic to a critical value from a Z-table, typically at a significance level of 0.05 (for a two-tailed test, this critical value is approximately ±1.96).
- Since the absolute value of our Z-statistic (-0.269) is much less than 1.96, we fail to reject the null hypothesis. This means there is not enough statistical evidence to suggest that the average weight of the sample group is significantly different from the population mean of 75 kg.
Z-Test calculation using Excel:
Download the Excel file link here
Z Test calculation with R:
Code
sample_weights <- c(72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72)
alpha = 0.05
# Calculate the sample size
sample_size <- length(sample_weights)
sample_size
Code
# Calculate the sample mean
sample_mean <- mean(sample_weights)
sample_mean
Code
# Define population parameters
population_mean <- 75
population_sd <- 10
# Calculate the Z-statistic
z_statistic <- (sample_mean - population_mean) / (population_sd / sqrt(sample_size))
z_statistic
Code
# Calculate the P-value for the two-tailed test
p_value <- 2 * (1 - pnorm(abs(z_statistic)))
p_value
Code
if (p_value < alpha) {
cat("Reject null hypothesis\n")
} else {
cat("Do not reject null hypothesis\n")
}
Do not reject null hypothesis
Z Test calculation with python:
Code
import numpy as np
from scipy.stats import norm
sample_weights = np.array([72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72, 75, 71, 74, 78, 79, 72, 73, 76, 77, 70, 79, 71, 74, 78, 72])
alpha = 0.05
# Sample size
sample_size = len(sample_weights)
sample_size
Code
# Sample mean
sample_mean = np.mean(sample_weights)
sample_mean
Code
# Population parameters
population_mean = 75
population_sd = 10
# Z-statistic
z_statistic = (sample_mean - population_mean) / (population_sd / np.sqrt(sample_size))
z_statistic
Code
# P-value for the two-tailed test
p_value = 2 * norm.sf(np.abs(z_statistic))
p_value
Code
if p_value < alpha:
print("Reject null hypothesis")
else:
print("Do not reject null hypothesis")
Do not reject null hypothesis
.